fix(node): close under-withholding via full ref scope and reachable-only pin set by beardthelion · Pull Request #46 · Gitlawb/node

beardthelion · 2026-06-13T02:53:45Z

Closes #42.

blob_paths in visibility_pack.rs builds the blob→path map that every withholding decision rests on. Two flaws let it produce an incomplete withheld set, which downstream code then treats as public, leaking withheld content in cleartext:

Ref-namespace gap. It walked only refs/heads/* and refs/tags/*, but the serve path enumerates git rev-list --objects --all (every object reachable from any ref). A blob reachable only through another namespace (e.g. refs/notes/*) never entered the withheld set and shipped in cleartext. Fix: drop the heads/tags filter and walk every ref, still per-ref via git ls-tree -r so all paths a blob lives at are seen (a blob is withheld only if denied at every path; if it is public at some allowed path it must still be sent). Note this is deliberately not rev-list --objects, which reports one representative path per object and would regress that correctness.
Fail-open ref walk. A non-zero git ls-tree exit was silently skipped, yielding a partial set. Fix: fail closed (bail!). All three callers already abort on error (serve propagates, push/encrypt-pin skip the block), so no path serves unfiltered.

It also narrows the two pin enumerators (ipfs_pin, pinata) from git cat-file --batch-all-objects to git rev-list --objects --all. The old call included unreachable/dangling objects, which have no ref and no tree path, so visibility rules cannot classify them; a dangling copy of a secret blob would bypass withholding straight into a permanent IPFS pin. Reachable-only keeps the pin set aligned with what blob_paths can evaluate.

Four tests added: a blob reachable only via a non-head/tag ref is still withheld; an untraversable ref makes the withheld computation return Err; and a dangling blob is absent from each pin enumerator while a committed blob is present.

Stacks on #40 (base=main), carrying the cumulative Phase-2/3 prereq diff since these files exist only in the unmerged stack.

Summary by CodeRabbit

New Features
- Added encrypted storage and retrieval for withheld blobs
- New gl clone command supporting partial clones with content filtering and automatic blob recovery
- Added API endpoints for querying encrypted blobs and withheld paths
- Enhanced sync functionality to replicate encrypted content across mirrors

coderabbitai · 2026-06-13T02:53:57Z

Warning

Review limit reached

@beardthelion, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 49 minutes and 37 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2585143e-8a5c-422b-a692-8a517b744955

📥 Commits

Reviewing files that changed from the base of the PR and between cc30427 and cd5cda4.

📒 Files selected for processing (9)

crates/gitlawb-node/src/api/encrypted.rs
crates/gitlawb-node/src/api/repos.rs
crates/gitlawb-node/src/arweave.rs
crates/gitlawb-node/src/db/mod.rs
crates/gitlawb-node/src/encrypted_pin.rs
crates/gitlawb-node/src/git/visibility_pack.rs
crates/gitlawb-node/src/ipfs_pin.rs
crates/gitlawb-node/src/pinata.rs
crates/gitlawb-node/src/sync.rs

📝 Walkthrough

Walkthrough

This PR implements end-to-end encryption and withholding for Phase 3 confidentiality: withheld blobs are sealed to recipient keys via ephemeral X25519 envelopes, encrypted with XChaCha20Poly1305, pinned to IPFS with opaque recipient tags, anchored to Arweave, replicated across mirrors using promisor-based filtering, and recovered locally via a new gl clone command with node and Arweave fallback paths.

Changes

Core encryption and visibility foundations

Layer / File(s)	Summary
Envelope encryption with X25519 key agreement `crates/gitlawb-core/src/encrypt.rs`, `crates/gitlawb-core/Cargo.toml`	New `seal_blob`/`open_blob` module implements multi-recipient envelope encryption: `seal_blob` generates random content key, encrypts plaintext with XChaCha20Poly1305, wraps content key per recipient via ephemeral X25519 and crypto_box, and outputs magic+version+header length+blinded JSON header+encrypted body. `open_blob` validates envelope structure, derives X25519 secret from keypair seed, attempts AEAD decryption per recipient entry, and decrypts body. Tests cover key agreement, round-trips, tampering, header blinding, and nonce-length protection. Dependencies: `curve25519-dalek`, `crypto_box`, `chacha20poly1305`.
Keypair seed exposure and module wiring `crates/gitlawb-core/src/identity.rs`, `crates/gitlawb-core/src/lib.rs`, `crates/gitlawb-node/Cargo.toml`	`Keypair::seed_bytes()` returns raw 32-byte Ed25519 seed for X25519 secret derivation. Encrypt module exported in crate root. Node workspace adds `ed25519-dalek` dependency.
Visibility-aware blob OID and recipient computation `crates/gitlawb-node/src/git/visibility_pack.rs`	New module enumerates reachable blobs via `git ls-tree -r` per ref, computes withheld OID sets by evaluating visibility rules at each blob path (fail-closed on ref-walk errors), filters object lists for replication, and derives per-blob recipient DIDs as owner_did + reader DIDs allowed at any path. Comprehensive tests cover anonymous/owner/reader contexts, recipient computation, seal/open round-trips, and nonstandard ref namespaces.
Sparse checkout glob helpers and announce gate logic `crates/gitlawb-node/src/visibility.rs`	New `withheld_globs` and `reincluded_globs` functions compute glob patterns for sparse-checkout exclusion/re-inclusion. Tests validate announce-gate logic (anonymous read of "/" gates replication announcement).

Server-side filtered pack serving and encrypted blob management

Layer / File(s)	Summary
Filtered pack generation and git protocol v0 framing `crates/gitlawb-node/src/git/smart_http.rs`	New `build_filtered_pack` shells out to rev-list/pack-objects to exclude withheld OIDs. `upload_pack_excluding` serves filtered packs in blocking task with protocol v0 NAK framing, optional side-band-64k band-1 wrapping, and comprehensive unit/integration tests using local Axum server, real `git clone --filter=blob:none`, and incremental fetch scenarios.
Upload-pack visibility enforcement in repos.rs `crates/gitlawb-node/src/api/repos.rs` (partial)	`git_upload_pack` computes withheld blob OIDs via spawn_blocking, routes to normal or filtered pack handler, and remaps errors (protocol issues → BadRequest, git failures → Git).
Encrypt-and-pin with recipient derivation and DB recording `crates/gitlawb-node/src/encrypted_pin.rs`, `crates/gitlawb-node/src/main.rs`	New module computes opaque HMAC-SHA256 `recipients_tag` (node-keyed, order-insensitive); skips re-sealing when tag matches DB record; otherwise resolves recipient DIDs to keys, reads blob bytes, seals via core encrypt module, pins envelope to IPFS, and records (oid, cid, tag). Best-effort per-blob error handling with logging. Tests validate tag properties and keying.
IPFS and Pinata object filtering and CID retrieval `crates/gitlawb-node/src/ipfs_pin.rs`, `crates/gitlawb-node/src/pinata.rs`	Both modules now accept `withheld: &HashSet<String>` parameter, filter enumerated objects through `replicable_objects`, and update `list_all_objects` to use `git rev-list --objects --all` instead of batch-all-objects (only reachable objects). IPFS adds new `cat` function to fetch CID bytes. Tests assert unreachable blobs excluded.
Per-push encrypted manifest anchoring to Arweave `crates/gitlawb-node/src/arweave.rs`	New `EncryptedManifest` struct and `anchor_encrypted_manifest` function build JSON payload with repo/owner/node/timestamp metadata and blob entries (oid, cid only; no recipients), send to Irys endpoint, return transaction id. No-ops when irys_url empty or blob list empty. Tests mock successful anchoring and validate missing recipient fields.
Receive-pack Phase 2 replication enforcement and propagation gating `crates/gitlawb-node/src/api/repos.rs` (main change)	New replication logic in `git_receive_pack` determines `announce` eligibility and optional `withheld` set from visibility rules. When announcable, conditionally spawns IPFS pinning of new objects, calls `encrypt_and_pin`, and attempts manifest anchoring (all best-effort). Downstream gossipsub and HTTP peer sync notifications are suppressed unless `announce` true. Arweave permanent anchoring gated by `announce && !irys_url.is_empty()`.
Database migrations and encrypted blob query/write methods `crates/gitlawb-node/src/db/mod.rs`	Schema v4 adds `encrypted_blobs(repo_id, oid)` with cid and recipients column plus index. Schema v5 drops recipients column, adds recipients_tag with non-null default. Db impl adds `record_encrypted_blob` upsert, `list_all_encrypted_blobs`, `encrypted_blob_cid`, and `encrypted_blob_recipients_tag` read methods.
REST API endpoints for encrypted blob operations `crates/gitlawb-node/src/api/encrypted.rs`, `crates/gitlawb-node/src/api/mod.rs`, `crates/gitlawb-node/src/api/visibility.rs`	New handlers: `list_encrypted_blobs` returns {oid, cid} pairs visible to caller (visibility-gated); `get_encrypted_blob` fetches encrypted envelope bytes from IPFS; `replicate_encrypted_blobs` returns replication-format JSON (no recipients); `withheld_paths` returns withheld and reinclude globs (whole-repo visibility gate, no reader_dids). Unit tests validate JSON schema.
HTTP router registration for encrypted blob and visibility endpoints `crates/gitlawb-node/src/server.rs`	New GET routes under /api/v1/repos/{owner}/{repo}/ for withheld-paths, encrypted-blobs list/get/replicate wired to visibility and encrypted handlers.

Mirror mode classification and encrypted blob replication

Layer / File(s)	Summary
Mirror mode selection and withheld-paths fetching `crates/gitlawb-node/src/sync.rs` (Part 1)	Introduces `MirrorMode` enum (Plain vs Promisor), `fetch_withheld` HTTP query to origin `/withheld-paths` (returns None on non-success/network/parse), and `classify_mirror` to select strategy. Also adds `ReplicaBlob`/`ReplicateResponse` types and `blobs_needing_replication` to decide which envelopes need (re)replication. Reusable `reqwest::Client` with 30s timeout passed through batch processing.
Encrypted blob replication for mirrors (Option B2) `crates/gitlawb-node/src/sync.rs` (Part 2)	New `replicate_encrypted_blobs` function fetches blob listings from origin, compares against local DB state by OID, pulls missing/changed envelopes via IPFS cat, validates CID match, and records rows keyed by mirror repo_id. Best-effort per-blob with logging; no-ops when IPFS unconfigured, endpoint absent, or no blobs returned.
Promisor clone/fetch with mode transitions `crates/gitlawb-node/src/sync.rs` (Part 3)	Updates `clone_repo` and `fetch_repo` to accept `MirrorMode`, apply `--filter=blob:limit=10g` and promisor/partialclonefilter config for promisor mode, and detect/unset promisor settings on transition back to plain mode (refetch with --refetch --prune). Helpers: `git_run` (stderr on error), `git_run_lenient` (ignore failures), `git_config_get` for promisor detection. Tests cover mode classification, replication decision, and promisor↔plain transitions.
Sync worker loop integration and client construction `crates/gitlawb-node/src/sync.rs` (Part 4)	Per-item sync calls `fetch_withheld`, classifies mirror mode, routes to clone/fetch variants, then calls `replicate_encrypted_blobs`. Signature changes to `process_batch` accept client parameter.

Client-side clone command with recovery paths

Layer / File(s)	Summary
CLI parsing, partial clone, and sparse checkout configuration `crates/gl/src/clone.rs`, `crates/gl/src/main.rs`	New `CloneArgs` struct. `setup_partial_clone` clones with `--filter=blob:none --no-checkout`, writes non-cone sparse-checkout excluding withheld globs and re-including nested allowed paths, performs branch checkout (defaults to origin/HEAD). `parse_repo` normalizes gitlawb:/// and bare / formats (supports DID owners with colons). Main adds Clone subcommand routing.
Node-based encrypted blob recovery via HTTP `crates/gl/src/clone.rs` (Part 1)	`fetch_withheld` queries node `/withheld-paths`, selecting signed vs unsigned endpoint based on local key, returning empty globs on 404/501. `recover_encrypted_blobs` fetches `/encrypted-blobs` list and per-blob envelopes, decrypts via `open_blob`, installs via `hash-object -w`, verifies OID match, records repo-relative paths. Best-effort per-blob error handling.
Arweave/IPFS fallback recovery with manifest discovery `crates/gl/src/clone.rs` (Part 2)	Types: `ManifestBlob`, `Manifest` for deserialization. `parse_tx_ids` extracts Arweave GraphQL transaction ids. `recover_from_arweave` discovers encrypted manifest transactions via GraphQL, fetches/parses each manifest, merges by latest RFC3339 timestamp, then for each missing object (guarded by GIT_NO_LAZY_FETCH) downloads envelope from IPFS CID endpoint, decrypts with caller key, installs/verifies blob OIDs. Best-effort per-blob; skips on gateway/parse/timeout failures.
Clone flow orchestration and sparse-checkout materialization `crates/gl/src/clone.rs` (Part 3)	`run()` function validates destination non-existence, fetches withheld/reinclude globs, performs partial clone setup, conditionally runs node-based recovery then Arweave fallback (when keypair available), updates sparse-checkout to re-materialize recovered paths for sparse clones, attempts checkout, prints messages. Comprehensive tests cover sparse patterns, response parsing, manifest merging, and end-to-end Arweave/IPFS scenarios with mocked gateways.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Gitlawb/node#42: Raised Phase 3 withholding issues with incomplete withheld sets from ref-namespace scope (heads/tags only) and fail-open ref traversal that could leak cleartext blobs; this PR's visibility_pack.rs implementation enumerates blobs only from heads/tags and skips ref traversal errors silently, directly matching the issue's failure modes.
Gitlawb/node#41: Calls for hardening Keypair::seed_bytes() to return a zeroizing wrapper; this PR exposes the method for X25519 derivation but does not yet zeroize the returned bytes, leaving the seed in memory.

Possibly related PRs

Gitlawb/node#25: Introduced path-scoped visibility rules and visibility_check logic; this PR extends that work by adding withheld_globs/reincluded_globs helpers and a new /withheld-paths endpoint that gate encrypted blob withholding and replication.

Suggested reviewers

kevincodex1

Poem

🐰 Blobs wrapped in X25519 envelopes bright,
Withheld by the rules, hidden from sight,
Pinned secret to IPFS, sealed tight,
Arweave anchors guide the recovery flight,
Clone sparse and recover—all's right! 🔐

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main fix: closing under-withholding by improving ref scope coverage and using reachable-only pin sets.
Linked Issues check	✅ Passed	All coding requirements from issue `#42` are met: full-ref enumeration replaces heads/tags-only walk in visibility_pack.rs, ref-walk failures now fail closed, and pin enumerators are aligned with reachable graph via git rev-list.
Out of Scope Changes check	✅ Passed	All changes directly support fixing the incomplete withheld set vulnerability. Encryption/decryption support and new APIs are necessary infrastructure for the Phase 2/3 replication and withholding system being deployed.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (4)

crates/gl/src/clone.rs (1)

262-267: ⚡ Quick win

Parse /encrypted-blobs into a typed response instead of Value.

A missing or mistyped blobs field currently degrades to [] and silently disables node-side recovery. Making this schema-strict would catch server/client contract drift the same way WithheldPathsResponse already does.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gl/src/clone.rs` around lines 262 - 267, Replace the loose
serde_json::Value parsing for resp in clone.rs with a strongly typed
EncryptedBlobsResponse struct so schema mismatches surface as errors: define
EncryptedBlobsResponse { blobs: Vec<EncryptedBlobOrString> } (matching the
server shape, analogous to WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).

crates/gitlawb-node/src/api/encrypted.rs (1)

109-123: ⚡ Quick win

Add a handler test for the 404/404 access-control contract.

The new test only locks the wire shape. A small handler-level test that exercises both “caller denied” and “OID missing” on get_encrypted_blob would protect the deliberate non-oracle behavior from future refactors.

Based on learnings, this endpoint should keep “not found” and “not authorized” externally indistinguishable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/encrypted.rs` around lines 109 - 123, Add a
handler-level test for the access-control contract of get_encrypted_blob that
verifies the external response is indistinguishable when the caller is denied
versus when the OID is missing: write two tests (or one parameterized test)
invoking the get_encrypted_blob handler — one scenario where the auth check
fails (simulate "caller denied") and one where the blob store returns NotFound
(simulate "OID missing") — and assert both responses have the same HTTP status
and body (e.g., both return 404/not-found) to ensure the endpoint does not leak
authorization state; reference the get_encrypted_blob handler in your test and
reuse existing test scaffolding in the encrypted.rs tests module.
Source: Learnings

crates/gitlawb-core/src/encrypt.rs (1)

66-88: ⚡ Quick win

Zeroize transient key material in seal_blob and open_blob.

content_key, the seed-derived X25519 secret, and the unwrapped content key all live in plain stack/heap buffers until drop. In this confidentiality path that is avoidable exposure; keep them in Zeroizing (or explicitly call zeroize()) once the ciphers are initialized.

Hardening sketch

+use zeroize::Zeroizing;
+
-    let mut content_key = [0u8; 32];
-    OsRng.fill_bytes(&mut content_key);
+    let mut content_key = Zeroizing::new([0u8; 32]);
+    OsRng.fill_bytes(&mut *content_key);
     let body_cipher = XChaCha20Poly1305::new_from_slice(&content_key)
         .map_err(|e| anyhow::anyhow!("content key: {e}"))?;
...
-    let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes()));
+    let seed = keypair.to_seed();
+    let my_x = XSecret::from(x25519_secret_from_seed(&seed));
...
-    let content_key = content_key.context("not a recipient of this envelope")?;
+    let content_key = Zeroizing::new(content_key.context("not a recipient of this envelope")?);

Also applies to: 126-167

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-core/src/encrypt.rs` around lines 66 - 88, In seal_blob and
open_blob, sensitive key material (the 32-byte content_key, the ephemeral
XSecret `eph`, the seed-derived X25519 secret, and any unwrapped content key
buffers) must be zeroed after use: wrap stack/heap buffers in zeroize::Zeroizing
(e.g., Zeroizing<[u8;32]>, Zeroizing<Vec<u8>>) or explicitly call zeroize() on
types that implement Zeroize immediately after the ciphers are initialized and
after encryption/decryption completes; ensure you zero `content_key`, the
ephemeral secret `eph` (or call its zeroize method), the seed-derived X25519
secret produced by x25519_private/related variable, and the unwrapped content
key before returning or pushing into Recipient so no plaintext key remains in
memory.

crates/gitlawb-node/src/git/visibility_pack.rs (1)

101-130: 🏗️ Heavy lift

Avoid re-traversing the repo to derive recipients.

withheld_blob_recipients() does one full blob_paths() sweep through withheld_blob_oids() and then a second full blob_paths() sweep for the recipient pass. On the push path, git_receive_pack() has already computed the withheld set before it gets here, so repos with visibility rules now pay up to three complete ls-tree traversals per push. Threading the precomputed withheld/blob-path data into this API would keep recipient derivation single-pass.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 101 - 130,
withheld_blob_recipients currently re-traverses the repo by calling
withheld_blob_oids(...) and then blob_paths(...), causing multiple ls-tree
sweeps; change withheld_blob_recipients signature to accept precomputed withheld
data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The new seed_bytes() method leaks a non-zeroizing copy of
the Ed25519 seed; replace its usage with the existing to_seed() which returns a
Zeroizing<[u8;32]>, or change seed_bytes() to return a zeroizing wrapper (e.g.
Zeroizing<[u8;32]>) instead of plain [u8;32]; update callers such as the code
paths that call seed_bytes() (including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.

In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-425: Short-circuit the expensive withheld walk by detecting
when the repo's rules cannot withhold any path-scoped blobs (e.g., rules is
empty or only applies to root) before calling
visibility_pack::withheld_blob_oids; if that check shows no possible withheld
blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.

In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 77-80: Replace silent `continue` branches when
`crate::git::store::read_object(repo_path, oid)` and the IPFS pin operation fail
with warning logs that include contextual details (repo_path, oid, and the
error/result) so failures are observable; in the match arms around `let data =
match crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent
IPFS pin match, call the module’s logger (e.g., process_logger.warn or
log::warn) with a concise message and the error/result, then continue as before
to preserve behavior.
- Around line 72-76: The current code builds `keys: Vec<VerifyingKey>` by
`filter_map(|d| did_to_key(d))`, which silently drops unresolved DIDs and
proceeds to encrypt to a partial recipient set; change this so any unresolved
DID causes the blob to be skipped: instead of `filter_map`, resolve all DIDs and
detect failures (e.g., collect Options or compare `keys.len()` to `dids.len()`
after mapping via `did_to_key`), and when a mismatch is found log a warning
referencing `oid` (similar to the existing log) and `continue` so you do not
seal to a partial set; ensure you keep the `VerifyingKey` handling consistent
with downstream code that expects `keys`.

In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-85: The cat function creates a reqwest::Client with no timeout
(reqwest::Client::new()) which can hang get_encrypted_blob and the sync worker;
modify cat (or callers) to use a client with a bounded timeout (e.g., build a
Client via reqwest::Client::builder().timeout(Duration::from_secs(...)) and
reuse it) or apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.

In `@crates/gitlawb-node/src/sync.rs`:
- Around line 181-188: The code currently treats a None result from
fetch_withheld as MirrorMode::Plain which incorrectly downgrades unknown/error
states; instead, check the Option returned by fetch_withheld and only set
MirrorMode::Plain when you get Some(empty); if fetch_withheld returns None
preserve the repository's existing promisor/partialclone state (do not call
classify_mirror on None). Concretely, change the branch around
fetch_withheld/classify_mirror to: if withheld.is_some() { let mode =
classify_mirror(withheld); } else { read the current repo promisor/partialclone
setting from the local repo at local_path (or otherwise keep the current
MirrorMode) and use that mode for fetch_repo/clone_repo }, so
fetch_repo/clone_repo are invoked with the preserved mode rather than
unconditionally using MirrorMode::Plain.

In `@crates/gl/src/clone.rs`:
- Around line 428-459: The current GraphQL fetch (the local `query` +
`gql_body`, use of `client.post(...).json(&gql_body).send().await`, and
subsequent `parse_tx_ids(&gql)`) only requests first:100 and stops, causing
missing manifests; change the logic to paginate: rewrite the GraphQL query to
take variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
- Around line 415-416: The slug normalization and Arweave discovery/parsing need
tightening: ensure owner short-form uses the same split logic as the node by
keeping owner_short = owner.split(':').next_back().unwrap_or(owner) and forming
slug = format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.

---

Nitpick comments:
In `@crates/gitlawb-core/src/encrypt.rs`:
- Around line 66-88: In seal_blob and open_blob, sensitive key material (the
32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519
secret, and any unwrapped content key buffers) must be zeroed after use: wrap
stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>,
Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize
immediately after the ciphers are initialized and after encryption/decryption
completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call
its zeroize method), the seed-derived X25519 secret produced by
x25519_private/related variable, and the unwrapped content key before returning
or pushing into Recipient so no plaintext key remains in memory.

In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 109-123: Add a handler-level test for the access-control contract
of get_encrypted_blob that verifies the external response is indistinguishable
when the caller is denied versus when the OID is missing: write two tests (or
one parameterized test) invoking the get_encrypted_blob handler — one scenario
where the auth check fails (simulate "caller denied") and one where the blob
store returns NotFound (simulate "OID missing") — and assert both responses have
the same HTTP status and body (e.g., both return 404/not-found) to ensure the
endpoint does not leak authorization state; reference the get_encrypted_blob
handler in your test and reuse existing test scaffolding in the encrypted.rs
tests module.

In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 101-130: withheld_blob_recipients currently re-traverses the repo
by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple
ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed
withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.

In `@crates/gl/src/clone.rs`:
- Around line 262-267: Replace the loose serde_json::Value parsing for resp in
clone.rs with a strongly typed EncryptedBlobsResponse struct so schema
mismatches surface as errors: define EncryptedBlobsResponse { blobs:
Vec<EncryptedBlobOrString> } (matching the server shape, analogous to
WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fab81d81-dc64-46fe-b486-709131c8a108

📥 Commits

Reviewing files that changed from the base of the PR and between a2217bf and cc30427.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (23)

crates/gitlawb-core/Cargo.toml
crates/gitlawb-core/src/encrypt.rs
crates/gitlawb-core/src/identity.rs
crates/gitlawb-core/src/lib.rs
crates/gitlawb-node/Cargo.toml
crates/gitlawb-node/src/api/encrypted.rs
crates/gitlawb-node/src/api/mod.rs
crates/gitlawb-node/src/api/repos.rs
crates/gitlawb-node/src/api/visibility.rs
crates/gitlawb-node/src/arweave.rs
crates/gitlawb-node/src/db/mod.rs
crates/gitlawb-node/src/encrypted_pin.rs
crates/gitlawb-node/src/git/mod.rs
crates/gitlawb-node/src/git/smart_http.rs
crates/gitlawb-node/src/git/visibility_pack.rs
crates/gitlawb-node/src/ipfs_pin.rs
crates/gitlawb-node/src/main.rs
crates/gitlawb-node/src/pinata.rs
crates/gitlawb-node/src/server.rs
crates/gitlawb-node/src/sync.rs
crates/gitlawb-node/src/visibility.rs
crates/gl/src/clone.rs
crates/gl/src/main.rs

coderabbitai · 2026-06-13T03:07:57Z

+    /// The raw 32-byte Ed25519 seed. Used to derive the X25519 secret for
+    /// envelope decryption (see `crate::encrypt`).
+    pub fn seed_bytes(&self) -> [u8; 32] {
+        self.signing_key.to_bytes()
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid adding a non-zeroizing raw-seed accessor.

to_seed() already exposes this material behind Zeroizing; seed_bytes() reintroduces the same long-lived signing seed as a plain [u8; 32]. That copy now flows into open_blob and the async encrypted-pin path, which widens private-key lifetime without adding any new capability. Prefer reusing to_seed() here, or make this helper return a zeroizing wrapper instead of raw bytes. This already shows up in crates/gitlawb-node/src/api/repos.rs:659-707, where the seed is moved into the spawned encrypted-pin task.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-core/src/identity.rs` around lines 55 - 59, The new seed_bytes() method leaks a non-zeroizing copy of the Ed25519 seed; replace its usage with the existing to_seed() which returns a Zeroizing<[u8;32]>, or change seed_bytes() to return a zeroizing wrapper (e.g. Zeroizing<[u8;32]>) instead of plain [u8;32]; update callers such as the code paths that call seed_bytes() (including open_blob and the encrypted-pin task in crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to call to_seed() directly so the raw seed is never held in plain memory.

coderabbitai · 2026-06-13T03:07:58Z

+    // withheld_blob_oids walks every ref with blocking `git ls-tree`; keep that
+    // off the async worker thread.
+    let withheld = {
+        let path = disk_path.clone();
+        let rules = rules.clone();
+        let owner_did = record.owner_did.clone();
+        let caller_owned = caller.map(str::to_string);
+        let is_public = record.is_public;
+        tokio::task::spawn_blocking(move || {
+            visibility_pack::withheld_blob_oids(
+                &path,
+                &rules,
+                is_public,
+                &owner_did,
+                caller_owned.as_deref(),
+            )
+        })
        .await
-        .map_err(|e| {
-            let msg = e.to_string();
-            if msg.contains("bad line length") || msg.contains("protocol error") {
-                tracing::warn!(repo = %name, err = %msg, "git-upload-pack: bad client request");
-                AppError::BadRequest(msg)
-            } else {
-                tracing::error!(repo = %name, err = %msg, "git-upload-pack failed");
-                AppError::Git(msg)
-            }
-        })?;
+        .map_err(|e| AppError::Git(e.to_string()))?
+        .map_err(|e| AppError::Git(e.to_string()))?
+    };
+
+    let resp = if withheld.is_empty() {
+        smart_http::upload_pack(&disk_path, body).await
+    } else {
+        tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack");
+        smart_http::upload_pack_excluding(&disk_path, body, &withheld).await
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip the withheld walk when no path-scoped rule can withhold anything.

After the "/" gate above, repos whose rule set is empty or root-only cannot produce a non-empty withheld blob set, but this branch still runs the full withheld_blob_oids() ref walk before falling back to the ordinary upload_pack(). That adds an extra full repo scan to every clone/fetch on the common public path.

💡 Minimal short-circuit

- let withheld = { - let path = disk_path.clone(); - let rules = rules.clone(); - let owner_did = record.owner_did.clone(); - let caller_owned = caller.map(str::to_string); - let is_public = record.is_public; - tokio::task::spawn_blocking(move || { - visibility_pack::withheld_blob_oids( - &path, - &rules, - is_public, - &owner_did, - caller_owned.as_deref(), - ) - }) - .await - .map_err(|e| AppError::Git(e.to_string()))? - .map_err(|e| AppError::Git(e.to_string()))? - }; - - let resp = if withheld.is_empty() { + let has_path_scoped_rules = rules.iter().any(|r| r.path_glob != "/"); + let resp = if !has_path_scoped_rules { smart_http::upload_pack(&disk_path, body).await } else { - tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack"); - smart_http::upload_pack_excluding(&disk_path, body, &withheld).await + let withheld = { + let path = disk_path.clone(); + let rules = rules.clone(); + let owner_did = record.owner_did.clone(); + let caller_owned = caller.map(str::to_string); + let is_public = record.is_public; + tokio::task::spawn_blocking(move || { + visibility_pack::withheld_blob_oids( + &path, + &rules, + is_public, + &owner_did, + caller_owned.as_deref(), + ) + }) + .await + .map_err(|e| AppError::Git(e.to_string()))? + .map_err(|e| AppError::Git(e.to_string()))? + }; + if withheld.is_empty() { + smart_http::upload_pack(&disk_path, body).await + } else { + tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack"); + smart_http::upload_pack_excluding(&disk_path, body, &withheld).await + } } .map_err(|e| {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 398 - 425, Short-circuit the expensive withheld walk by detecting when the repo's rules cannot withhold any path-scoped blobs (e.g., rules is empty or only applies to root) before calling visibility_pack::withheld_blob_oids; if that check shows no possible withheld blobs, skip the tokio::task::spawn_blocking(...) and call smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to spawn_blocking and then choose between smart_http::upload_pack and smart_http::upload_pack_excluding based on the resulting withheld set; use the existing variables and functions (rules, visibility_pack::withheld_blob_oids, withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to implement the early return.

coderabbitai · 2026-06-13T03:07:58Z

+        let keys: Vec<VerifyingKey> = dids.iter().filter_map(|d| did_to_key(d)).collect();
+        if keys.is_empty() {
+            tracing::warn!(oid = %oid, "no resolvable recipient keys; skipping encrypted pin");
+            continue;
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not encrypt to a partial recipient set when some DIDs fail key resolution.

At Line 72, filter_map drops unresolved DIDs and still seals for the remaining keys. That can make authorized readers unrecoverable even though the blob is recorded as successfully encrypted/pinned. Treat any unresolved DID as a per-blob failure and skip instead of partially sealing.

💡 Suggested fix

- let keys: Vec<VerifyingKey> = dids.iter().filter_map(|d| did_to_key(d)).collect(); - if keys.is_empty() { - tracing::warn!(oid = %oid, "no resolvable recipient keys; skipping encrypted pin"); + let mut keys = Vec::with_capacity(dids.len()); + let mut unresolved = Vec::new(); + for did in dids { + match did_to_key(did) { + Some(k) => keys.push(k), + None => unresolved.push(did.clone()), + } + } + if !unresolved.is_empty() { + tracing::warn!(oid = %oid, unresolved = ?unresolved, "some recipient DIDs could not be resolved; skipping encrypted pin"); + continue; + } + if keys.is_empty() { + tracing::warn!(oid = %oid, "empty recipient set; skipping encrypted pin"); continue; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 72 - 76, The current code builds `keys: Vec<VerifyingKey>` by `filter_map(|d| did_to_key(d))`, which silently drops unresolved DIDs and proceeds to encrypt to a partial recipient set; change this so any unresolved DID causes the blob to be skipped: instead of `filter_map`, resolve all DIDs and detect failures (e.g., collect Options or compare `keys.len()` to `dids.len()` after mapping via `did_to_key`), and when a mismatch is found log a warning referencing `oid` (similar to the existing log) and `continue` so you do not seal to a partial set; ensure you keep the `VerifyingKey` handling consistent with downstream code that expects `keys`.

coderabbitai · 2026-06-13T03:07:58Z

+        let data = match crate::git::store::read_object(repo_path, oid) {
+            Ok(Some((_t, bytes))) => bytes,
+            _ => continue,
+        };


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add warnings for object-read and IPFS pin failures instead of silent skips.

At Line 77 and Line 88, failures currently continue without logs. This makes production retry/debugging hard and weakens observability during partial outages.

💡 Suggested fix

- let data = match crate::git::store::read_object(repo_path, oid) { - Ok(Some((_t, bytes))) => bytes, - _ => continue, - }; + let data = match crate::git::store::read_object(repo_path, oid) { + Ok(Some((_t, bytes))) => bytes, + Ok(None) => { + tracing::warn!(oid = %oid, "blob object not found locally; skipping encrypted pin"); + continue; + } + Err(e) => { + tracing::warn!(oid = %oid, err = %e, "failed reading blob object; skipping encrypted pin"); + continue; + } + }; @@ - let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await { - Ok(c) if !c.is_empty() => c, - _ => continue, - }; + let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await { + Ok(c) if !c.is_empty() => c, + Ok(_) => { + tracing::warn!(oid = %oid, "pin_git_object returned empty CID; skipping record"); + continue; + } + Err(e) => { + tracing::warn!(oid = %oid, err = %e, "pin_git_object failed; skipping encrypted pin"); + continue; + } + };

Also applies to: 88-90

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 77 - 80, Replace silent `continue` branches when `crate::git::store::read_object(repo_path, oid)` and the IPFS pin operation fail with warning logs that include contextual details (repo_path, oid, and the error/result) so failures are observable; in the match arms around `let data = match crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent IPFS pin match, call the module’s logger (e.g., process_logger.warn or log::warn) with a concise message and the error/result, then continue as before to preserve behavior.

coderabbitai · 2026-06-13T03:07:58Z

+/// Fetch raw bytes for a CID from the local Kubo node (`/api/v0/cat`).
+pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
+    if ipfs_api.is_empty() {
+        return Err(anyhow::anyhow!("IPFS not configured"));
+    }
+    let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid);
+    let resp = reqwest::Client::new().post(&url).send().await?;
+    if !resp.status().is_success() {
+        return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status()));
+    }
+    Ok(resp.bytes().await?.to_vec())


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound the Kubo /cat request.

This new helper creates a fresh reqwest::Client with no timeout, so a slow or stalled Kubo fetch can hang both get_encrypted_blob and the sync worker indefinitely. Please reuse a client with a bounded timeout, or set a per-request timeout here.

⏱️ Suggested fix

pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> { if ipfs_api.is_empty() { return Err(anyhow::anyhow!("IPFS not configured")); } let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid); - let resp = reqwest::Client::new().post(&url).send().await?; + let resp = reqwest::Client::new() + .post(&url) + .timeout(std::time::Duration::from_secs(30)) + .send() + .await?; if !resp.status().is_success() { return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status())); } Ok(resp.bytes().await?.to_vec()) }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/ipfs_pin.rs` around lines 75 - 85, The cat function creates a reqwest::Client with no timeout (reqwest::Client::new()) which can hang get_encrypted_blob and the sync worker; modify cat (or callers) to use a client with a bounded timeout (e.g., build a Client via reqwest::Client::builder().timeout(Duration::from_secs(...)) and reuse it) or apply a per-request timeout (e.g., wrap the send/bytes await with tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on slow/stalled Kubo; update the cat signature or module to accept/reuse a Client or enforce a timeout around resp = client.post(&url).send().await and resp.bytes().await to avoid indefinite hangs.

coderabbitai · 2026-06-13T03:07:58Z

+        let withheld = fetch_withheld(client, &origin_url, owner_short, repo_name).await;
+        let mode = classify_mirror(withheld);
+
        let result = if local_path.exists() {
-            fetch_repo(&local_path, &remote_url).await
+            fetch_repo(&local_path, &remote_url, mode).await
        } else {
-            clone_repo(&remote_url, &local_path).await
+            clone_repo(&remote_url, &local_path, mode).await
        };


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t downgrade an unknown mirror state to Plain.

fetch_withheld collapses 404, 5xx, network, and parse failures into None, and this branch always turns that into MirrorMode::Plain. For an existing promisor mirror, that flows into fetch_repo(..., MirrorMode::Plain), which clears remote.origin.promisor and remote.origin.partialclonefilter before refetching. A transient withheld-paths outage on a still-withheld repo will therefore start failing syncs until the endpoint recovers. Only downgrade on Some(empty), and preserve the current promisor mode when the lookup is unknown.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/sync.rs` around lines 181 - 188, The code currently treats a None result from fetch_withheld as MirrorMode::Plain which incorrectly downgrades unknown/error states; instead, check the Option returned by fetch_withheld and only set MirrorMode::Plain when you get Some(empty); if fetch_withheld returns None preserve the repository's existing promisor/partialclone state (do not call classify_mirror on None). Concretely, change the branch around fetch_withheld/classify_mirror to: if withheld.is_some() { let mode = classify_mirror(withheld); } else { read the current repo promisor/partialclone setting from the local repo at local_path (or otherwise keep the current MirrorMode) and use that mode for fetch_repo/clone_repo }, so fetch_repo/clone_repo are invoked with the preserved mode rather than unconditionally using MirrorMode::Plain.

coderabbitai · 2026-06-13T03:07:58Z

+    let owner_short = owner.split(':').next_back().unwrap_or(owner);
+    let slug = format!("{owner_short}/{name}");


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Client-side Arweave lookup normalization:" rg -n -C2 'owner_short|slug = format!|graphql' crates/gl/src/clone.rs echo echo "Server-side encrypted-manifest tag construction:" rg -n -C4 'encrypted-manifest|Schema|Repo|App-Name' crates/gitlawb-node/src/arweave.rs crates/gitlawb-node/src/api/repos.rs

Repository: Gitlawb/node

Length of output: 20700

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Find EncryptedManifest and where `manifest.repo` is set:" rg -n "struct EncryptedManifest|manifest\.repo|EncryptedManifest<'_>" crates/gitlawb-node/src/arweave.rs echo echo "Search for `repo:` field construction / slug derivation in node code:" rg -n "Repo\s*[:=]|repo\s*:\s*|format!\(.*owner.*name|owner.*split\(':'\)|next_back\(\)" crates/gitlawb-node/src/arweave.rs crates/gitlawb-node/src -S

Repository: Gitlawb/node

Length of output: 10904

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== crates/gitlawb-node/src/arweave.rs (top + EncryptedManifest + tag header) ===" nl -ba crates/gitlawb-node/src/arweave.rs | sed -n '1,260p' echo echo "=== crates/gl/src/clone.rs (Arweave fallback slug + GraphQL variables) ===" nl -ba crates/gl/src/clone.rs | sed -n '360,520p'

Repository: Gitlawb/node

Length of output: 199

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "arweave.rs lines:" wc -l crates/gitlawb-node/src/arweave.rs echo echo "crates/gitlawb-node/src/arweave.rs: EncryptedManifest + manifest_json payload + build_manifest_tags_header + any construction sites:" cat -n crates/gitlawb-node/src/arweave.rs | sed -n '1,260p' echo echo "clone.rs: Arweave fallback slug + GraphQL body:" wc -l crates/gl/src/clone.rs echo cat -n crates/gl/src/clone.rs | sed -n '380,480p'

Repository: Gitlawb/node

Length of output: 15653

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Call sites for anchor_encrypted_manifest / EncryptedManifest construction ===" rg -n "anchor_encrypted_manifest|EncryptedManifest\s*\{" crates/gitlawb-node/src echo echo "=== Inspect around likely repo_slug construction (owner_short + repo_name) in api/repos.rs ===" rg -n "owner_short\s*=.*split\(':'\)\.next_back\(\)\.unwrap_or|repo_slug\s*=\s*format!" crates/gitlawb-node/src/api/repos.rs -n sed -n '650,820p' crates/gitlawb-node/src/api/repos.rs | cat -n

Repository: Gitlawb/node

Length of output: 9929

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== clone.rs: encrypted-blobs parsing ===" rg -n "/encrypted-blobs|encrypted-blobs|recover_encrypted_blobs|serde_json::Value" crates/gl/src/clone.rs echo echo "=== clone.rs: recover_encrypted_blobs implementation ===" # Print the function body area (best-effort) around its definition rg -n "async fn recover_encrypted_blobs" -n crates/gl/src/clone.rs | head -n 5 # Use line ranges around likely area sed -n '1,220p' crates/gl/src/clone.rs | head -n 220 echo echo "=== clone.rs: fetch_withheld + schema strictness for withheld-paths ===" rg -n "withheld-paths|fetch_withheld|recover_from_arweave" crates/gl/src/clone.rs

Repository: Gitlawb/node

Length of output: 9912

🏁 Script executed:

#!/bin/bash set -euo pipefail # encrypted-blobs + parsing behavior echo "=== clone.rs: recover_encrypted_blobs + parse_tx_ids ===" sed -n '220,420p' crates/gl/src/clone.rs echo echo "=== clone.rs: WithheldPathsResponse parsing (schema strictness / fail-open) ===" sed -n '180,260p' crates/gl/src/clone.rs echo echo "=== clone.rs: tests relevant to strictness / arweave recovery ===" rg -n "withheld-paths|encrypted-blobs|recover_from_arweave|unauthorized|happy|schema" crates/gl/src/clone.rs

Repository: Gitlawb/node

Length of output: 11990

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '740,820p' crates/gl/src/clone.rs

Repository: Gitlawb/node

Length of output: 3031

Fix Arweave manifest discovery pagination; encrypted-blobs parsing should be schema-strict

Arweave Repo slug normalization matches: the client uses {last-':' owner segment}/{name}, and the node anchors EncryptedManifest.repo with the same owner_did.split(':').next_back() short-form, tagging Repo:{manifest.repo}.

Arweave GraphQL discovery hard-codes first:100; if a repo has >100 encrypted-manifest anchors under the same Repo tag, recovery can miss the newest per-blob manifests and leave still-live blobs unrecovered—needs pagination/iteration until coverage is complete.

/encrypted-blobs parsing is fail-open on schema drift: recover_encrypted_blobs uses Value and defaults blobs to [] via unwrap_or_default(), unlike /withheld-paths which deserializes a typed struct and errors on missing/wrong fields; switch /encrypted-blobs to a strict typed response and fail on missing/wrong types.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gl/src/clone.rs` around lines 415 - 416, The slug normalization and Arweave discovery/parsing need tightening: ensure owner short-form uses the same split logic as the node by keeping owner_short = owner.split(':').next_back().unwrap_or(owner) and forming slug = format!("{owner_short}/{name}"); update the GraphQL discovery for encrypted-manifest anchors in discover/recovery code to paginate instead of hard-coding first:100 (iterate queries until no more edges or timestamps covered) so recover_encrypted_blobs sees all anchors; change recover_encrypted_blobs to deserialize the /encrypted-blobs response into a strict typed struct (like the existing withheld-paths type) and fail on missing/wrong fields instead of using Value with unwrap_or_default(), ensuring schema-strict parsing and surfaced errors.

coderabbitai · 2026-06-13T03:07:58Z

+    let query = r#"query($repo:String!){transactions(tags:[{name:"App-Name",values:["gitlawb"]},{name:"Schema",values:["gitlawb/encrypted-manifest/v1"]},{name:"Repo",values:[$repo]}],first:100){edges{node{id}}}}"#;
+    let gql_body = serde_json::json!({ "query": query, "variables": { "repo": slug } });
+    let resp = match client
+        .post(format!("{ag}/graphql"))
+        .json(&gql_body)
+        .send()
+        .await
+    {
+        Ok(r) if r.status().is_success() => r,
+        _ => return Ok(vec![]),
+    };
+    let gql: serde_json::Value = match resp.json().await {
+        Ok(v) => v,
+        Err(_) => return Ok(vec![]),
+    };
+    let tx_ids = parse_tx_ids(&gql);
+    if tx_ids.is_empty() {
+        return Ok(vec![]);
+    }
+
+    // 2. Fetch and parse each manifest body, then merge latest-wins per oid.
+    let mut manifests = Vec::new();
+    for tx in tx_ids {
+        let m = match client.get(format!("{ag}/{tx}")).send().await {
+            Ok(r) if r.status().is_success() => r,
+            _ => continue,
+        };
+        if let Ok(parsed) = m.json::<Manifest>().await {
+            manifests.push(parsed);
+        }
+    }
+    let oid_cid = merge_manifests(manifests);


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Paginate manifest discovery instead of truncating at 100 transactions.

This fallback only inspects the first 100 manifest txs, then merges per-push manifests to find the latest CID per OID. Once a repo has more than 100 pushes, any private blob whose last seal is older than that window disappears from recovery even if it is still live at HEAD.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gl/src/clone.rs` around lines 428 - 459, The current GraphQL fetch (the local `query` + `gql_body`, use of `client.post(...).json(&gql_body).send().await`, and subsequent `parse_tx_ids(&gql)`) only requests first:100 and stops, causing missing manifests; change the logic to paginate: rewrite the GraphQL query to take variables for "first" and "after" (cursor), then loop POSTing the query (building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor }`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no more pages, then proceed to fetch manifests and call `merge_manifests(manifests)` as before; ensure the loop uses the same `client.post(...).json(&gql_body).send().await` error handling as existing code and retains the existing behavior when responses fail.

beardthelion · 2026-06-13T03:45:33Z

CodeRabbit's pass here spans the whole unmerged stack (this PR is base=main, so it re-reviews all prereq commits). None of the eight findings fall on this PR's actual delta, which only touches blob_paths, ipfs_pin::list_all_objects, and pinata::list_all_objects. They're routed to where they belong rather than folded in here:

seed_bytes zeroization: already tracked in Harden Keypair::seed_bytes: route seed access through the zeroizing wrapper #41
ipfs_pin::cat timeout: already tracked in ipfs_pin::cat has no bounded timeout; a stalled IPFS endpoint hangs fetch/replication #45 (closed Add bounded timeout to ipfs_pin::cat #43/Add bounded timeout to ipfs_pin::cat #44 as dups)
encrypted_pin partial recipient sealing + silent skips (feat: encrypted replication for private subtrees (B1/B2/B3) for #18 #36): filed encrypted_pin: sealing to a partial recipient set leaves authorized readers unrecoverable #47
peer sync downgrading mirrors to Plain on transient error (feat(node): peer partial-mirrors for repos with private subtrees #35): filed Peer sync downgrades still-withheld mirrors to Plain on a transient withheld-paths error #48
gl clone Arweave recovery pagination + schema-strict parsing (feat: encrypted replication for private subtrees (B1/B2/B3) for #18 #36): filed gl clone Arweave recovery: paginate manifest discovery and make /encrypted-blobs parsing schema-strict #49
serve-path withheld-walk short-circuit: filed Serve path: short-circuit withheld walk when no path-scoped rule can withhold #50

kevincodex1 · 2026-06-13T08:23:38Z

hello @beardthelion do you want to be a maintainer of this node repository? I would like to add you as one. let me know if you are interested.

beardthelion · 2026-06-13T14:11:50Z

Thanks @kevincodex1, I'd be glad to. I've enjoyed working on this and I'm happy to take on more responsibility for it.

Couple of things so I know what I'm signing up for: what's the expectation for the role, just merge access, or also triage/releases and general upkeep? And who else maintains it currently?

Either way, count me in.

kevincodex1 · 2026-06-18T13:37:43Z

please rebase and fix conflicts

…adability The origin no longer stores recipient DIDs. Migration v5 replaces the encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag used only to detect a recipient-set change for re-seal. B1 discovery and fetch are now gated by the same repo-readability check the git read path uses, not by per-recipient matching; decryption is gated by the envelope crypto, so a non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it. encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the Arweave manifest tuple drops the now-unused recipient vec. A DB compromise no longer reveals the reader set; recovering it would require brute-forcing candidate DID sets against the keyed tag with the node key.

…rror Address review on the at-rest blinding change: - The encrypted-blobs/replicate listing returned {oid, cid} with no visibility check, so a non-readable repo's blob index was reachable by an unauthenticated caller who guessed {owner}/{repo}. Gate it by the same repo-readability check discovery and fetch use. For the intended case (a public repo with withheld subtrees) the public root keeps this open to peers; only fully non-readable repos are withheld, which is the desired behavior. - encrypt_and_pin treated a recipients_tag DB read error as a cache miss and resealed, causing avoidable IPFS writes during a partial outage; skip and retry on the next push instead. - Correct the get_encrypted_blob doc comment to describe repo-readability access.

…nly pin set blob_paths walked only refs/heads/* and refs/tags/* and skipped silently on a failed git ls-tree, so a blob reachable only through another namespace, or a ref that failed to traverse, could fall out of the withheld set and ship in cleartext. Walk every ref and fail closed on traversal error. The pin enumerators (ipfs_pin, pinata) used git cat-file --batch-all-objects, which includes unreachable/dangling objects that have no path and cannot be classified for withholding. Switch them to git rev-list --objects --all so the pin set matches the reachable graph blob_paths evaluates.

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

beardthelion force-pushed the fix/withholding-ref-scope branch 2 times, most recently from 90dc8d4 to 67afb20 Compare June 18, 2026 17:59

beardthelion added 4 commits June 19, 2026 10:47

chore: remove redundant .gitignore entry (covered by local exclude)

54bcab4

beardthelion force-pushed the fix/withholding-ref-scope branch from 67afb20 to cd5cda4 Compare June 19, 2026 15:49

		let owner_short = owner.split(':').next_back().unwrap_or(owner);
		let slug = format!("{owner_short}/{name}");

Conversation

beardthelion commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

beardthelion commented Jun 13, 2026

Uh oh!

kevincodex1 commented Jun 13, 2026

Uh oh!

beardthelion commented Jun 13, 2026

Uh oh!

kevincodex1 commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beardthelion commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading